Evaluation of OpenMP for the Cyclops Multithreaded Architecture

نویسندگان

  • George Almási
  • Eduard Ayguadé
  • Calin Cascaval
  • José G. Castaños
  • Jesús Labarta
  • Francisco Martínez
  • Xavier Martorell
  • José E. Moreira
چکیده

Multithreaded architectures have the potential of tolerating large memory and functional unit latencies and increase resource utilization. The Blue Gene/Cyclops architecture, being developed at the IBM T. J. Watson Research Center, is one such systems that offers massive intra-chip parallelism. Although the BG/C architecture was initially designed to execute specific applications, we believe that it can be effectively used on a broad range of parallel numerical applications. Programming such applications for this unconventional design requires a significant porting effort when using the basic built-in mechanisms for thread management and synchronization. In this paper, we describe the implementation of an OpenMP environment for parallelizing applications, currently under development at the CEPBA-IBM Research Institute, targeting BG/C. The environment is evaluated with a set of simple numerical kernels and a subset of the NAS OpenMP benchmarks. We identify issues that were not initially considered in the design of the BG/C architecture to support a programming model such as OpenMP. We also evaluate features currently offered by the BG/C architecture that should be considered in the implementation of an efficient OpenMP layer for massive intra-chip parallel architectures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the IBM Cyclops-64 Multithreaded Architecture

A trend of emerging large-scale multi-core chip design is to employ multithreaded architectures such as the IBM Cyclops-64 (C64) chip that integrates large number of hardware thread units, main memory banks and communication hardwares on a single chip. A cellular supercomputer is being developed based on a 3D connection of the C64 chips. This paper introduces our design, implementation, and eva...

متن کامل

Optimization and Scaling of Multiple Sequence Alignment Software ClustalW on Intel Xeon Phi

This work is aimed to investigate and to improve the performance of multiple sequence alignment software ClustalW on the test platform EURORA at CINECA, for the case study of the influenza virus sequences. The objective is code optimization, porting, scaling and performance evaluation of parallel multiple sequence alignment software ClustalW for Intel Xeon Phi (the MIC architecture). For this p...

متن کامل

Performance Evaluation of OpenMP Benchmarks on Intel's Quad Core Processors

Multi-core processors are widely used in desktop and server applications and are now appearing in real-time embedded applications. We are investigating optimal configurations of some of the available multicore processors suitable for developing real-time software for a multithreaded application used for pavement performance measurements. For the application discussed in this paper we are consid...

متن کامل

An Implementation and Evaluation of Thread Subteam for OpenMP Extensions

OpenMP provides a portable programming interface on shared memory multiprocessors (SMPs). The set of features in the current OpenMP specification provides essential functionality that was selected mostly from existing shared-memory parallel application programming interfaces (APIs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the ste...

متن کامل

Evaluation of a Multithreaded Architecture for Cellular Computing

Cyclops is a new architecture for high performance parallel computers being developed at the IBM T. J. Watson Research Center. The basic cell of this architecture is a single-chip SMP system with multiple threads of execution, embedded memory, and integrated communications hardware. Massive intra-chip parallelism is used to tolerate memory and functional unit latencies. Large systems with thous...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003